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Abstract 
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1 Introduction 



Curve fitting plays a central role in the analysis of lattice simulation data. 
One of the most important and common uses of curve fitting in lattice gauge 
theory is the extraction of hadronic masses and matrix elements from measured 
correlation functions. 

The problem of extracting the mass spectrum from the correlators measured 
in Lattice QCD simulations is well known [1,2J. The simulation produces data 
Gi = G{ti) for the expectation values of the correlator G{t) at a finite number 
of discrete (Euclidean) time steps ti, 1 < i < N. On the other hand, from 
theory the exact form of the propagator is known to be given by an infinite 
seried^ 

oo 

G(t) = EZ„e-^"*, (1) 

n=0 

where we will assume that the energy levels are ordered, En < -En+i- The 
problem is then to determine an infinite number of amplitudes Z„ > and 
energies En > from only a finite number of data points Gi, an obviously 
ill-posed problem. 

To make the problem well-posed, we have to add some further physical con- 
straints. The piece of theoretical information that is normally used is that the 
sequence of the Z„ is bounded from above, and therefore the correlator will 
be dominated by the lowest few terms at all but the smallest values of t due 
to the exponential suppression by En- We can therefore truncate the series in 
equation ([T]) after a finite number of terms, provided we only attempt to fit 

^ For periodic boundary conditions, hyperbolic functions will appear instead of the 
exponential. For a static particle, the energy is simply the particle mass. 
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Ke enouEfh for the truncation to include all terms that make a 
significant contribution. 

In doing so, we are faced with a choice: Either take a fixed number n^ax oi 
terms of the sum in ([T]) and adjust tmin so as to obtain a good fit, or fix 
tmin and vary rimax so as to extract the largest possible amount of significant 
information. The problem with the first strategy is that we are throwing away 
valuable information contained in the data points at t < tmin, leading to large 
statistical errors if tmin is chosen too big. These need to be balanced against 
large systematic errors arising if tmin is chosen too small for the given nmax PI- 
In this paper we will therefore adopt the second approach and attempt to fit 
all data points (excluding only the single point at t = for practical reasons) 
with a variable number Umax of exponentials. 

Naively, one might want to try to simply run a series of fits using a state-of-the 
art fitting method such as Levenberg-Marquardt at a number of different 
Umax, and with a variety of initial parameter values, and choose the fit that 
produces the lowest P^r degree of freedom. This method can be made to 
work in the case of one single correlator if the problem of finding an appropri- 
ate starting point in a potentially multi-modal landscape is somehow solved. 
However, for many questions in lattice QCD it is necessary to fit multiple 
correlators, which may or may not share some of the energy levels En. In this 
case, the fast combinatorial growth with nmax of the number of possibilities 
of assigning shared or separate fit parameters En to the fitting functions for 
the different correlators renders the application of this naive method to those 
problems largely impossible. The problem of choosing appropriate initial pa- 
rameter values further exacerbates this approach. 
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The current state of the art using the variable-nmax approach is that taken 
in [l] where it is used in the context of a Bayesian method [5J of constrained 
curve fitting. The latter works by adding prior information about the fit pa- 
rameters and using it to constrain the fit to a smaller subset of likely parameter 
values out of the space of all possible values. Only those parameters whose 
fitted values are largely independent of the priors used to constrain them are 
considered to be determined by the data, while the others are disregarded as 
having been imposed by the priors. 

While it is thus possible to determine which fitted quantities are independent 
of, or only weakly dependent on, the priors and thus determined from the data, 
the idea of using some external knowledge as an input could be seen as incom- 
patible with the notion of a first-principles determination of the quantities of 
interest from QCD itself, without any recourse whatsoever to empirical mod- 
els. Moreover, in some cases appropriate priors may not be available. Under 
those circumstances, it becomes desirable to be able to extract some estimate 
of the parameters to be fitted from the data themselves, and to use this es- 
timate as a prior in the context of a Bayesian constrained fit. A number of 
methods to do this, such as the sequential empirical Bayes method [6J, have 
been used in the existing literature. 

A completely different state-of-the-art approach that does not rely on prior 
information while allowing for extraction of a spectrum from multiple sets of 
data for improved statistics is the variational method [7]I5|U] . Here one sets out 
by fixing a channel corresponding to a specific set of quantum numbers, and 
finds a set of appropriate linearly independent operators Oj for the channel. 
One then calculates all diagonal (OjOj) and off-diagonal {OiOj), i ^ j, corre- 
lators between these operators, each operator having a corresponding form for 
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the sink that annihilates the state created by the operator at the source. One 
may then use the variational method on this cross-correlator matrix Cij{t) to 
find a superposition of the original operators that corresponds to the lowest 
energy state of the channel. Assuming a meson channel, the diagonal correla- 
tor of this operator may be fit by a single exponential function whose mass in 
the exponent is the ground state energy. Operators corresponding to excited 
states, whose diagonal correlators ideally correspond to functions only of the 
excited state masses, may be produced in a similar manner [10]. 

While very powerful, the variational method has a number of features that 
limit the scope of its applicability. Firstly, the use of the variational method 
is facilitated by the selection of operators such that the correlator matrix is 
hermitian |8iillj . However, in lattice simulations it is often desirable to sup- 
press unwanted but usually present contributions from high-frequency modes 
by means of quark smearing of the operators [T2]|13|14] . In this case, hermitic- 
ity of the correlator matrix requires smearing to be applied both at the source 
and the sink of the correlator, which in most cases is very expensive compu- 
tationally. Secondly, the variational approach also requires both source and 
sink operators to have the exact quantum numbers of the channel of interest, 
instead of the less stringent usual requirement that both operators overlap 
with the state of interest and that at least one of them have its exact quan- 
tum numbers. This requirement limits the selection of correlators that are 
usable with the variational method. Finally, the variational method requires 
all off-diagonal correlators to be calculated. The number of needed correlators 
thus grows as A^^, with the number of operators in the channel of interest. 
Since automated generation of group theoretical operators for a given sym- 
metry channel [TT|15|ll6j means that the number of operators that could be 
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used may be rather large, this quadratic growth in computational complex- 
ity imposes further limitations. Thus correlator selection may be done more 
judiciously without the restrictions imposed by the variational method. 

A different problem arises in the case where one desires to fit correlators cor- 
responding to channels with different lattice quantum numbers. An example 
of this occurs when the continuous rotational symmetries of a quantum theory 
get broken when discretizing it to a lattice. In the continuum, the rotational 
group has representations corresponding to arbitrary integer or half-integer 
angular momenta J, but on the lattice one only retains the finite symmetry 
of the octahedral group. A particle of angular momentum J will therefore 
appear in one or more of the lattice channels labeled by those irreducible rep- 
resentations of the octahedral group to which the value of J subduces [17j. 
Thus a particle of integral spin will appear in some known subset of the irreps 
Ai, A2, E, Ti and T2 of the octahedral group. Therefore, the extraction of 
the physical spectrum is complicated by the fact that a physical state's mass 
may lie in several lattice symmetry channels. One needs to identify in which 
irreps the physical state lies not only to aid in identifying its physical angular 
momentum J via its subduction signature, but also to extract its mass as the 
latter is contained in the data in all channels in which it appears. Since a 
state either lies within a lattice symmetry channel or it does not, the process 
of state identification and fitting requires an algorithm that is inherently dis- 
continuous. As the number of resolvable states in lattice simulations increases, 
the need for a systematic solution to this second problem will as well. 

Evolutionary fitting algorithms, while widely used in other areas of research 
|18H19f2UH21f22f23ll24H25j . are not currently in common use in lattice QCD. In 
this paper, we present an application of evolutionary algorithms [26]I27II28|29] 
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to the problem of extracting mass spectra from hadronic correlators. We be- 
lieve that the advantages of evolutionary algorithms are particularly pertinent 
to this problem: evolutionary algorithms allow dynamic variation of the func- 
tional form of the fit function (such as the number of states to be fitted) in a 
natural way so as to minimize per degree of freedom; the data themselves 
thus tell us how many states can be reliably extracted. In addition, evolution- 
ary algorithms are inherently global optimizers and as such largely eliminate 
any residual dependence of the result on the initial guesses used as starting 
values, which may be a problem when using conventional local optimizers. 
Furthermore, based as they are on the Darwinian principle of adaptation by 
mutation and natural selection [30j, evolutionary fitting methods excel at ex- 
tracting information from data without the need for any external prior infor- 
mation. Finally, evolutionary fitting is able to fit multiple correlator datasets 
with either identical or differing quantum numbers. 



In this paper we aim to introduce lattice theorists to the possibilities and fea- 
tures of evolutionary fitting algorithms and to present some preliminary results 
from our practical implementation of such an algorithm. Section [2] gives a gen- 
eral overview of the concepts and ideas of evolutionary algorithms. Section [3] 
details an evolutionary algorithm that can be used to fit a single hadronic 
correlator, and section H] outlines the adaptations to the algorithm that are 
necessary in order to fit across multiple datasets. Some possible variations of 
the basic algorithm are explained in section [5l 
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2 Evolutionary Algorithms 



Evolutionary computing and genetic algorithms [26|27] are a very active field 
in computer science and numerical optimization (cf. e.g. the review [2B] and 
references therein). By borrowing concepts from evolutionary biology, evolu- 
tionary computing is able to harness the power of natural selection by mutation 
and selective breeding for the purpose of solving function optimization and re- 
lated problems. The nomenclature is borrowed from evolutionary biology as 
well: candidate solutions are called "organisms" , whose internal encodings are 
their "genotypes" , the target function is referred to as "fitness" , and each step 
of the algorithm produces a new "generation" . 

A number of fine distinctions between "genetic algorithms" , "evolution strate- 
gies", "evolutionary programming" and related evolutionary algorithms and 
methods is sometimes employed in the literature [29] ; here we will use the term 
"evolutionary algorithm" broadly to mean any global optimization method 
that relies on some form of random mutation combined with selective breed- 
ing in a population of candidate solutions [28]. From this point of view, an 
evolutionary algorithm consists of the following ingredients: 

• A search space Q, 

• a fitness function / : ^ ^ M, assumed to be bounded from above, 

• a mutation function '■ Q ^ Q, and 

• a selection function Srj' : Q'^ 

Here is the (fixed) size of the population. Both the mutation and the se- 
lection function depend on some uncorrelated white noise r/, rj' that acts as a 
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source of randomness. The update step on a population P^. G Q is then 



P.+i = 5,,(M,(P.)). 



(2) 



The mutation function acts on each individual in a population separately, 
while the selection function performs comparisons in fitness between individ- 
uals, and may also involve crossover or "sexual" reproduction, which creates 
a new individual from a pair of "parent" individuals. 

The simplest selection procedure to use is straightforward "elitist" selection 
of the fittest Nf.iite < N individuals from a population of size N , followed by 
repopulation with their pairwise offspring (where a child is formed by random 
interpolation between the parameter values from either parent for each pa- 
rameter). More sophisticated selection procedures (such as "roulette wheel", 
"rank" or "tournament" selection [28]) could be used instead. Adding addi- 
tional, less fit, members to the elite on the basis of genetic distance may be 
helpful in maintaining genetic diversity and avoiding premature convergence. 

If we require that the mutation function leaves the fittest individual in a 
population alone, thereby ensuring 



and that the selection procedure never decreases the maximum fitness within 
a population. 



it follows that the sequence fr = maxpgp^ f{p) is bounded from above and 
monotonically increasing, and hence will converge. 

Evolutionary algorithms are inherently global optimizers, as opposed to local 




(3) 



max f(p)> max 



(4) 
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optimizers such as steepest descent or Newton methods. This global nature 
is largely due to their inherent parallelism, since all organisms in a popula- 
tion search the fitness landscape in parallel, and also because crossover allows 
information learned by different organisms to be combined and propagate 
throughout the population. Another important feature of evolutionary algo- 
rithms is their ability to optimize by varying discrete parameters (such as the 
functional form of a candidate solution) in a natural and straightforward way, 
something which local optimizers relying on certain smoothness assumptions 
about the target function cannot easily do. 

While evolutionary algorithms have not yet become a standard tool in lattice 
QCD, they have previously been used for the purpose of nonperturbative Lan- 
dau gauge fixing [HT|32II33|I31] and (in combination with an accept-reject step 
to achieve detailed balance) as a simulation algorithm In other fields 

of physics, evolutionary fitting algorithms have been used among other things 
to discriminate between different SUSY models p^j, to analyze resonances 
in p(7,i^+)A reactions [19j, to fit stellar spectra [20], to determine the mass 
loss rate of stellar winds [21], to search for extrasolar planets [221I23] . to build 
diatomic potentials using a variational method [21], and to solve black-box 
system characterization problems in engineering [2S]. In most of these cases, 
just as in this work, it has been observed that combining the evolutionary 
algorithm with a conventional (local) optimization step gave the best results. 

We believe that the flexibility and global nature of evolutionary algorithms 
makes them an excellent tool for the purpose of curve fitting, especially when 
the exact form of the fitting function (such as the number of exponentials to 
use in our case) is subject to some kind of data-dependent uncertainty. 



10 



3 Evolutionary Fitting of a Single Correlator 



In this and the following section we present some details of our version I of 
an evolutionary algorithm for fitting correlation functions in lattice QCD. 

For the purpose of fitting a diagonal meson correlatoiQ, the search space is 



jc e CO 



3 Umax >0,Zn> 0, £"„ > E^^i > : 

G{t)= Y: Z„(e-^"* + e-^"(^-*))|, 



n=0 

where T is the temporal extent of the periodic lattice. The fitness function is 
f{G) = -x\G)/ndof{G), where the correlated is 

X'{G) = EiG^ - G{m^-%m - G{t,)) , (5) 

with the covariance matrix defined by 



o"tj — GiGj — Gi Gj , (6) 

and where 

is the number of degrees of freedom of the fit given by 

An organism G E Q can therefore be represented by a list of Umax pairs (Z„, i?„), 



^ For a practical implementation, we have chosen Python [37], augmented by 
SciPy [38], because of its object-orientation and the flexible list and tuple types it 
natively provides. 

For baryonic and off-diagonal correlators, the functional form needs some adjust- 
ments. Nevertheless, the general method works the same in those cases. 
^ Where a fitness function f : Q ^ [0; 1] is desired, functions such as f{G) = 

e-xHG)/naof{G) f^Q-j ^ l/{l + x^{G)/ndof{G)) can be used instead. 



11 



and it is this representation on which the mutation and breeding algorithms 
outhned below work. 



There are two kinds of mutation steps needed to search the entire search 
space Q. The first amounts to increasing or decreasing rimax of an individual 



organism. When making these 



ength mutations it is valuable to keep the sum 



J2n of all amplitudes fixediU One reason for this is that the coefficient sum 
G{0) = J2n Zn can become relatively stable across the population early on, so 
that changing rimax by simply dropping the pair (Z„„^^,E„^^J or adding a 
random pair will tend to produce an unfit organism, regardless of whether some 
potentially desirable organism of the new length exists. Moreover, a situation 
can occur where spurious near-degenerate states are kept because the change 
in overall amplitude from just removing a single one of them increases more 
than is offset by the simultaneous increase in Udof caused by the mutation. The 
second type of mutation performs a random modification of a pair {Zn, En)- 
A natural way to implement this is the addition of a pair of independent 
Gaussian random variables to the original pair. 

Since mutations can become somewhat disruptive of already accumulated ge- 
netic information in the later stages of evolution, it may be useful to make 
the rate of mutation dependent on x^/^dof in such a way as to increase the 
search area early on, before contracting it to a more local search around the 
optima already found as the algorithm converges. 



^ This may be accomplished, for instance, by sharing the amplitude of a removed 
term between the remaining amplitudes, and by decreasing the amplitudes of the 
existing exponentials so as to keep the overall amplitude fixed when adding a new 
term. 
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Mutation steps can also be combined with a finite number of steps of a local 
optimization routine (such as Levenberg-Marquardt)[z! We have found that 
the addition of this latter step can greatly accelerate convergence by giving 
beneficial mutations an improved chance of survival^ 



Putting these ingredients together, we arrive at the following mutation algo- 
rithm: 

(I) Generate a random number p G [0; 1]. 

(II) If p < Pq^"'(1 — e~"'''"^^/"'*°-f ) (where p[f", a'^" are tunable parameters): 

(a) Generate a random number p' G [0; 1]. 

(b) If p' < {1 — l/rimax), decrease Umax by one, removing one randomly 
chosen exponential (Z„, En) from the fit, and redistributing the am- 
plitude in Zn between the neighboring exponentials. 

(c) Else increase Umax by one and add a new, randomly generated (Z„, En) 
pair to the fit, decreasing the amplitudes of the pre-existing exponen- 
tials so as to keep the total amplitude fixed. 

(III) Generate a random number p" G [0; 1]. 

(IV) If p" < p^"^"^ (where pg"'^'" is a tunable parameter): 

(a) Generate n^ax pairs of Gaussian deviates (AZ„, Ai?„) with zero 
mean and standard deviation a = cxq°'^"^{1 — 6""'""^™^^/"''°/) (where 
^parm ^parm tuuablc parameters). 



^ Specifically, we take the functional form implied by the genotype and do a fixed 

number of steps of the Levenberg-Marquardt routine using the genotype's parameter 

values as the initial values. The new genotype's values are set to the result. Some 

random subset of the parameter values can be kept fixed in the routine if desired. 
^ This kind of strategy has sometimes been referred to as a hybrid genetic or 

memetic algorithm |41j . 
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(b) Replace (Z„, En) by {Zn + AZ„, + A£'„), unless this would lead 
to a negative new value for En or 
(V) Optionally perform a local (e.g. Levenberg-Marquardt) optimization on 
the fit with probability pjj"'^"'. 

This mutation procedure depends on a number of tunable parameters. In 
a number of numerical tests on a set of synthetic data, we found that the 
point to which the algorithm ultimately converged did not depend on the 
values of these parameters, indicating good stability of the answer. The rate 
of convergence, on the other hand, did depend on the particular parameters 
chosen, although the dependence was small for "sensible" parameter choices. 
Generally, should not be too small, in order to explore the full solution 
space; we found that pjf'* = 0.5 worked well, a'^'^ did not appear to have a 
crucial influence on convergence, and was set to a'^" = 0.2 in subsequent runs. 
Since element mutations can be rather disruptive of already achieved partial 
convergence, Po"'^"* to be chosen reasonably small; we found jj^"'^"^ — 0.1 
a sensible choice. The convergence rate did not appear to strongly depend on 
^parm ^parm ^ ^j^.^j^ ^^^^ ^parm ^ g.l and (7^"^'" = 0.5, respectively. 

On the other hand, performing a local optimization step can never be harmful, 
and should be set as large as computational resources allow. It should be 
pointed out that the optimal choice of parameters will likely depend on the 
particular fitting problem investigated in each case, since the fitness landscapes 
can conceivably look very different for different data. 

The breeding or crossover function returns a child organism from two parent 
organisms (pari and par2), and works as follows: 

(I) Letn;„^^L' = nia«i,nCj). 
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(II) Generate ^Jnai^ pairs of independent uniformly distributed random num- 
bers {xi,yi) e [-6]1 +6]. 
(Ill) For n < min(n^^^, n^^J), choose the fit parameters of the child to be 
^ZchUd^E'j^iid^ = (x„ZP-i + (1 - a;„)^r',2/n^r' + (1 - 2/n)^r'); for 
other n, choose them equal to the longer parent's fit parameter values. 

The fit parameter values of the child organism are therefore chosen in a hy- 
percube spanned by the parent's fit values. Allowing for the possibility of 
extrapolation instead of interpolation by introducing a parameter 6 is neces- 
sary to avoid rapid contraction towards central points. In agreement with |18j . 
we found 6 = 0.2 sufficient to prevent this contraction. "Parthenogenesis" or 
"cloning" of an existing individual is possible by breeding an organism with 
itself. 

Putting these elements together, we arrive at the following basic evolutionary 
step to generate the next generation from a given population: 

(I) All organisms except the fittest one are subjected to mutations according 

to the mutation algorithm stated above. 
(II) Selection and breeding are carried out as follows: 

(a) The fittest Neiue organisms are selected. 

(b) Another NdiversUy organisms are selected at random in order to main- 
tain genetic diversity and avoid premature convergence. 

(c) For each possible combination of the selected organisms (including 
those containing the same organism twice), a child organism is cre- 
ated according to the breeding algorithm above, and these child or- 
ganisms form the next generation. 

(d) Nmutant copies of orgauisms from the elite are added to the population 
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and are subjected to targeted mutations as a form of local search 
around the elite. 

By not subjecting the fittest organism to mutations and including partheno- 
genesis in the breeding step, we ensure that inequalities ([HD and (jl]) are fulfilled, 
and that thus the algorithm will eventually converge. 

The basic evolutionary step is repeated until I'^dof converges below a chosen 
threshold [x^ /ndof]max, or until a chosen maximum number Tmax of generations 
has been exceeded with no improvement in the best overall genotype for the 
last Tstatic generations. 

In order to be able to get a handle on the overall stability of the evolutionary 
fit, and also as an aid in a possible parallelization, we add a final wrinkle by 
partitioning the total population into "islands" of equal size, each of which 
forms a separate population to which the basic evolutionary step is applied 
independently of the other islands. To avoid individual islands with particu- 
larly unfavorable starting conditions getting stuck, a weak coupling between 
islands is introduced by replacing the least fit organism on each island with a 
randomly chosen immigrant from a randomly chosen island with probability 
^mig ]-,gfQj.g carrying out the selection step. We find that the lowest-lying states 
are identified rather consistently across all islands fairly early on; only for the 
most massive states discrepancies in number, energy and amplitude are found 
between different islands, sometimes even in late generations, indicating that 
the identification of these states is somewhat uncertain. 

Our main program thus employs a multi-island ecosystem for the evolutionary 
optimization and then tries to improve on the results of the best evolutionary 
fit by performing a Levenberg-Marquardt optimization upon it. If sufficiently 
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many islands are being used, it is possible to do this as a constrained fit where 
desired, with the average and standard deviation of the parameter values de- 
rived from bootstrapping |l2] the best island fits being employed as priors. 
Real parameter errors may be determined by bootstrapping the data config- 
urations and rerunning the fit algorithm. However, as this could in principle 
produce different functional forms for different bootstrap configurations, and 
as this procedure may be quite expensive, one may choose instead to apply a 
Levenberg-Marquardt fit to the bootstrap configurations using the final func- 
tional form obtained from the evolutionary algorithm, as one would do in a 
conventional fit. 

We have settled on the latter procedure for our implementation, and have 
found that the final Levenberg-Marquardt fit always returned within errors 
(as estimated by the bootstrapped Levenberg-Marquardt procedure) of the 
evolutionary fit, indicating that the evolutionary fit was already close to opti- 
mal. The errors from the bootstrapped Levenberg-Marquardt fit were usually 
(though not always, depending on how well genetic diversity between islands 
was preserved in each case) comparable to those estimated from the difference 
between the fits obtained on different islandsQ In the following, we use the 
error estimates from the final bootstrapped Levenberg-Marquardt fit as our 
estimate of the error in the fitted parameters. 

To demonstrate the viability of our method, we have run a fit on two sets 
of synthetic data consisting of 200 artificial correlators for 48 timesteps, each 
constructed from a signal consisting of a sum of exponentials with known 

^ For the purpose of these single correlator tests, we used a large ecosystem with 
100 islands. 
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Fig. 1. Result of a fit to a set of pion-like synthetic correlators. Shown is a plot of 
the mass against the excitation number of the state. The horizontal lines are the 
input values used to create the synthetic data; the data points and errors are the 
fit results. 

masses and amplitudes, and Gaussian noise. In one case ("pion-like", figure [I]) 
the amplitude of the noise scaled linearly with the signal, in the other ( "rho- 
like", figure [2]) the noise amplitude was kept constant. It can be seen that the 
ground state mass and the mass of the first excited state are extracted with 
high reliability, while for the higher states good estimates are obtained. The 
increased error in the excited states is largely due to the noise component in 
the created data which makes it impossible to adequately resolve those states 
and not to a shortcoming in the algorithm itself. 

The running time for our PYTHON implementation of each of the two boot- 
strapped fits was on the order of one to two hours on a single-processor Pen- 
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Fig. 2. Result of a fit to a set of rho-like synthetic correlators. Shown is a plot of 
the mass against the excitation number of the state. The horizontal lines are the 
input values used to create the synthetic data; the data points and errors are the 
fit results. 

tium 4 workstation; implementing the same algorithm in a compiled language 
could cut this runtime down further. It should be stressed that the evolution- 
ary fit works robustly without any human intervention, thus saving valuable 
human time at the expense of a moderate increase in computer time when 
compared to more conventional excited state fitting methods that often re- 
quire a larger degree of user intervention. 

Finally, in figure [3] we show the results from fits to actual pion and rho cor- 
relators from a quenched simulation using Wilson quarks, demonstrating the 
ability of our program to deal with real lattice data. 
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Fig. 3. Result of a fit to an actual meson correlator from a quenched simulation 
using Wilson quarks. The simulation {N^onf = 100) was run on a 16^ X 32 lattice 
at /3 = 6.0, K = 0.154. The red points and error bars are the results of the fit; the 
blue bands are the approximate masses of the continuum states corresponding to 
the lowest and first excited states in the fitted channels extrapolated to the quark 
masses employed in the simulation. The largest fitted state does not correspond to 
a known continuum state and may be a lattice artifact. 

4 Evolutionary Fitting of Multiple Correlators 

In this section we discuss the generalizations of the evolutionary fitting algo- 
rithm required for fitting multiple datasets simultaneously. A more sophisti- 
cated genotype is required in this case, since now the same energy En can 
occur throughout some subset of the datasets. Hence, the fit for dataset 
number i will now be represented by a list of n^^^ coefficients {Z^\l^^), 
n = 1, . . . , n^g^^ where / G {!,..., nimax} is an integer index into a list of nimax 
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energy states Em indicating to which state the coefficient is associated]^ The 
hst El . . . Em^^^ is common to all datasets. 

In summary, for a fit of imax datasets the complete genotype is of the form: 
Fit Genotype 

= (Dataset coefficients, Mass list) 

= ((Dataset 1 coefficients, . . . , Dataset imax coefficients). Mass list) (8) 



with 



Dataset i coefficients = {{z['> , ), • • • , {Z% , 1% )) 



'^max '^n 

Mass list = (El,..., E^_J . (9) 

Assuming the datasets are not correlated in any way, the new is simply 
the sum of terms of the form in equation ([5]), one per dataset. The number of 
degrees of freedom changes from the form of equation ([71) to: 

ndof{G) = ndata - rrimax - J2 ^max (10) 

i=l 

where Udata now counts all included timesteps in the fit of all datasets. 

The presence of integer variables in the multi-dataset genotype requires some 
adaptations to the breeding algorithm used. For practical reasons, it is ad- 
vantageous to use a fixed-length integer implementation with n-bit integers, 
where 2" is the maximum number of distinct states we will allow in our fit. 
Crossover can be performed by exchanging the first m {0 < m < n) bits of 
two integers. Mutations are implemented by fiipping each bit of the integer 
with some fixed probability. 



The stored integer is interpreted modulo rumax to ensure it maps to an actual 
energy state. 
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The genotype has a hierarchical structure as a hst of hsts ultimately con- 
taining floating point or integer numbers. Mutation and crossover can therefore 
be structured by recursing through these list structures, applying appropriate 
mutation and crossover operations at each level. The lists can be distinguished 
by the number (fixed or variable) and type (homogeneous or heterogeneous) 
of their elements, and by whether they are ordered or unordered. Elementwise 
mutation can be done on any kind of list. Lists of variable length can also be 
mutated by removing elements or adding a random new element, and ordered 
lists may be mutated by permuting their elements. Likewise for crossover, 
building two new lists by picking from the elements of the parent lists in order 
can be done for any list. Homogeneous lists allow more general subsets of the 
parents' elements to be chosen. Elementwise crossover may also be done, in 
order for heterogeneous lists or with random pairs of the parents' elements in 
the homogeneous case. All of the different mutation and crossover operations 
that are possible in each case can have different probabilities assigned to them 
and for lists of variable length, a range of valid list lengths may be specified. 

In addition to these generic genetic operations, the multi-dataset fitting prob- 
lem benefits from some operations specific to its structure. One notes that 
the genotype of equation ([H]) allows for multiple representations of the same 
fitting function, because the coefficient integers are taken modulo the number 
of masses in the fit, because either the masses or the indices to them may be 
in different orders, and because some masses may be unused or referred to 
multiple times for a single dataset. This degeneracy can have an adverse effect 
on the convergence of the algorithm. To encourage the algorithm to work to- 
ward a single representation of the solution, we employ a reduction mutation 
which sorts the mass list and removes unused masses. It also combines coef- 
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ficients pointing to the same mass, places coefficient indices uniquely within 
the range 1, . . . , rrimax, and orders them within each dataset by the associated 
mass index. In order to favor this operation, as well as any other mutation 
that may reduce redundancies, it is beneficial to count all masses and coeffi- 
cients, whether redundant or not, in the count of the degrees of freedom in 
equation ( ITOi) . 

As in the single-correlator case, interspersing local optimization steps using 
the Levenberg-Marquardt method with the other mutations was found to be 
useful in accelerating convergence. Limiting the mutation to a fixed number 
of steps of the Levenberg-Marquardt algorithm and making it a relatively 
improbable mutation keeps the overall time evolution reasonable. 

Special care needs to be taken in the multi-dataset case when adding or re- 
moving masses, since simply dropping a mass from the list will mean that 
any coefficients belonging to the dropped mass will now become associated 
with a different mass, which tends to have a catastrophic effect on fitness, 
especially for more evolved genotypes. A careful generalization of the way in 
which amplitudes are redistributed when changing the number of masses in 
the single-correlator case is therefore necessary. This is facilitated by first ap- 
plying the reduction operation mentioned above before adding or removing a 
mass from the mass list. When removing a mass, one then loops through the 
dataset functions to see which ones have a coefficient corresponding to the 
removed mass. For any that do, the coefficient is deleted and its amplitude is 
redistributed between neighboring masses, either by increasing the coefficients 
for the closest masses already having coefficients in that dataset function, or by 
adding a new coefficient for an unrepresented neighboring mass. When adding 
a mass, at least one of the datasets receives a new coefficient associated with it. 



23 



and the coefficients of the other masses in those datasets are lowered accord- 
ingly. As a final step, a Levenberg-Marquardt optimization step is carried out 
in order to give the new mutant a better chance of survival. There is obviously 
a tradeoff between the probability of such relatively expensive mutations and 
the maximum size of the population that can be employed. 

As a test of the efficiency of the evolutionary algorithm in the multi-dataset 
case we have run fits of synthetic data. The result of such a test is shown 
in figure HI Four hypothetical diagonal meson correlators were constructed 
to contain four masses among them. One correlator had coefficients for only 
two of the masses, two had coefficients for three of the masses, and one had 
coefficients for all four masses. The four datasets, with 48 timesteps each, 
were modified by adding Gaussian noise, the amplitude of which was chosen 
small enough to allow statistical discrimination of all states. A Levenberg- 
Marquardt fit to the synthetic data, using the masses and coefficients used to 
generate the data as the starting point, is shown in the last column of figure HI 
As expected, the parameters shifted only marginally from their ideal values 
and l^dof for the fit is very close to 1. 

The multi-dataset algorithm was then run on the synthetic data. It was re- 
stricted to fit genotypes containing a maximum of eight masses with pos- 
itive masses of value less than 10. Each dataset function was allowed to 
have up to six coefficients whose values were only restricted to being pos- 
itive. Four islands were used, each containing 120 individuals (specifically 
N elite = Ndiversity = 5, Nmutants = 20). Figure H dispkys the best fit geno- 
type from each generation along with its x^/^dof- It is evident that the algo- 
rithm quickly succeeds at finding a good fit to the data. By generation 20 one 
has a fit with good x'^/^dof and characteristics almost identical to the model 



24 





i 1 


1 1 1 1 1 1 1 1 1 


1 1 1 


1 


1 1 1 


1 1 1 1 1 1 1 1 1 1 1 


1 




10^ 


— • 

: A 


• 

• 

• 






• 

V 
A 
O 
□ 


Dataset 1 Coefficients ( x 
Dataset 2 Coefficients ( x 
Dataset 3 Coefficients ( x 
Dataset 4 Coefficients ( x 
Masses 


10000 ) 
1000) 
100) 
10) 


- 

^ 






: : : 


















: A ■ V ■ ■ 

: ' : ^ iv 

■ : A : : ^ 


V 








- 


lo' 


; ^ 


:a :^ :a :a 
: A :a A A □ 
: : : : 
: □ : □ : : 


: ^ A 
: A 


A A 


\ \ \ 


\ 

A 


1 




= 


: : : ° 

jO '0 . ^ . 

1 jo * ; 


.0 o ^ 








~= 






'i' □++ |D 
: * : + : [ft- :cP 
: : □ : : 


tn °* 








1 


1 


i 


; ; ; ; + 

:+:+)- :++:++ 
:□ : : : 

f t+ ft 


: *^ 

:++ •+ o» 

it it t 


■ 


If*** 

i 


+ + + 
t t t 
• + C»+ • + 

it jt it 

i : : 1 


+ 

i 

i 





1 10 100 1000 Model 

Generation 

Fig. 4. Best fit genotype and its x^/^dof of each generation for a multi-dataset fit 
are shown. Four datasets were constructed to contain four masses with each dataset 
having coefficients for two to four of the masses. The model fit that generated the 
data is shown on the right. Each fit is displaced horizontally with different masses 
in separate columns with their corresponding coefficients in each of the datasets 
vertically aligned above them. The latter have been multiplied by powers of 10 for 
display purposes. The left side of the plot shows the results of the evolutionary 
algorithm, starting with the best fit of the first generation. Only generations with 
an improvement in the best fit genotype are plotted. For clarity a few of these fits 
have only their x^/i^'dof displayed (open circles) to ensure that all data between 
subsequent drop lines corresponds to a single fit. 



function, having the same number of masses and corresponding coefficients, 
except for the addition of a small (four orders of magnitude lower) coefficient 
added in the ffist dataset for the highest mass. Generation 64 shows that this 
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functional form with an additional coefficient actually improves the fit, having 
a lower /"^dof than the optimized model function. Generations 100 onward 
improve even further in the fit by bifurcating the lowest mass to produce a fit 
of ever so slightly greater probability than the actual model function used to 
generate the data. This was the final result as of generation 2000. The total 
run was performed overnight on a single-processor Pentium 4 workstation. 

Overall one notices that the algorithm adds masses initially as required before 
proceeding to coalesce masses and combine coefficients to improve the fit. The 
reduction mutation serves to encourage an ordering of masses from lowest to 
highest but generations 2 to 10 show it is not strictly required. 

We are currently employing this program in a forthcoming analysis [13] of the 
meson spectrum of twisted-mass QCD [44J based on the representation theory 
of the octahedral group with generalized parity [13]. In figure 0, we show the 
results of a fit to actual data from the Wilson QCD action in the pion channel. 
The algorithmic parameters were the same as for the previous fit. Again, one 
observes that initially the number of states found fiuctuates considerably, with 
states being added to improve the fit. At some point, too many states which are 
nearly degenerate have been introduced, and the population culls unnecessary 
bifurcations. One observes that by generation 200 the energy states have been 
largely found, further improvements occurring within the coefficients. The 
fitness of the best genotype traces this behavior] 

As an aside we note that for this fit and the fit to synthetic data shown in figured! 
the restriction En < 10 was initially imposed with an eye to restricting ourselves 
to physical masses and thereby preventing the occurrence of "runaway" solutions 
commonly encountered in this fitting task. We subsequently found this restriction 
to be unnecessary. We conjecture that the problem with runaway solutions may 
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Fig. 5. A simultaneous fit to actual data of eight diagonal pion (i.e. A^*^ = A^^) 
correlators from a quenched simulation using Wilson quarks {(3 = 6.0, n = 0.1554, 
20^ X 48, 600 configurations) is shown; the eight operators used are from 05]. Only 
the energies of the fittest organism of each generation are shown; the coefficients 
in each dataset, a further 28 parameters in the final fit, are not. The last column 
depicts the best fit found with bootstrap errors produced via Levenberg-Marquardt 
fits with its fixed functional form. Also shown (circles) is (the logarithm of) x^/^dof 
of the best genotype, which indicates that by generation 25 one has technically a 
good fit {x^/udof « 1). 



To get an impression of how well the algorithm converges, we show histograms 
of x^/^dof values of the final best fit for 160 twisted-mass meson channels in 



be due to trying to fit a poor functional form to the data, something which our 
algorithm avoids, thus removing the need for such a constraint. Subsequently, we 
merely constrain parameters to be positive. 
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figure El The 160 channels correspond to four quark masses (the mesons con- 
sist of mass-degenerate quarks and antiquarks), two particle types (charged or 
neutral), and 20 octahedral irreducible representations (A^*^). The 400 opera- 
tors used (16 local and 384 extended) are detailed in [33]. Diagonal correlators 
were obtained for all of these operators, encompassing all channels. For opera- 
tors in representations of dimension greater than one, all the correlators were 
row-averaged before fitting. The quenched simulation used degenerate twisted 
mass quarks with quark and link smearing of operators on a 20^ x 48 lattice at 
(3 = 6.0, m ~ 771,^,171,^/2,171,^/ 3, rris/Q (where iris is the physical strange quark 
mass); see [m|47ll48j for further details. 

For each channel the evolutionary algorithm was run three times: first using all 
of the data {Neon fig = 600 configurations), then one third of it {Neon fig = 200), 
and finally one sixth of it {N^onfig = 100). The fitness distribution for the 
full data set is excellent with x^/^dof distributed locally about 1.0, clearly 
demonstrating the algorithm is robust in its ability to find a good fit, when 
the latter exists. As the amount of data included drops, however, the certainty 
with which states are resolved falls and the optimal fit becomes poorer. Fits 
required a minimum of 600 generations, and ran for up to 1200 generations 
if improvement still occurred in the best organism. The shown fits are for 
correlators with smeared operators. Unsmeared correlators (not shown) of 
the same operators which exhibit a greater number of excited states were 
also fit for the 600 configuration case with no appreciable difference from the 
histogram of the smeared one. 

As a measure of stability of these fits, bootstrap errors of the fitness, a^2 /^^of > 
have been calculated for each fit and their histograms are also shown. Specifi- 
cally, for each fit, 3 x Neon fig bootstrap configurations of the data were made 
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Fig. 6. Shown are normalized histograms of X^/^dof (thick Unas) of fits to 160 
meson quantum channels containing between 2 and 16 correlators each. For each 
channel the evolutionary algorithm was run three times: using all {Neon fig = 600 
configurations), one third {Neon fig = 200), and one sixth {N^onfig = 100) of the 
lattice data. For the full data set, the fitness distribution peaks sharply around 
1.0, but as the amount of data included drops, the certainty with which states are 
resolved falls and the optimal fit becomes poorer. Bootstrap errors of the fitness, 
cr^2 , are also shown (thin lines) . The stability of the fit is seen to decrease in 
the same way as the overall 'X^ jn^of as the quality of the data declines. 



and the functional forms of the best fits were fit to them using a Levenberg- 
Marquardt routine whose initial values were those of the best fit. For the full 
600 configurations one observes the variation in the goodness of the fit is small 
{(y-x^lnios < Since good fits occur in essentially all of the bootstrap con- 
figurations, this increases confidence that the spectrum found is accurate. The 
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Fig. 7. Shown is the execution time (on a single Intel Xeon 3.06 GHz CPU) per 
parameter as a function of the number of parameters in the final best fit. The latter 
includes each of the masses as well as the coefficients on each dataset for the given 
fit. The number of generations for all fits shown was fixed at 600, by which time 
all fits were already well converged. Smeared correlators (squares) exhibit a smaller 
number of fitted parameters than unsmeared ones (circles), which is expected since 
smearing suppresses contributions from excited states. 

same is largely true for 200 configurations as well {(Jx^/naof < 0-6) > but by the 
time the data set is reduced to only 100 configurations the error in I'^dof is 
as wide as the value itself, indicating along with the fit histogram, that the 
amount of data is insufficient for solution of the problem. 

Finally, in figure [7] we show the runtime per parameter as a function of the 
total number of parameters (masses and coefficients in all datasets) in the final 
fit. Overall, we find that the dominant contribution to runtime comes from 
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the number of coefficients, which in turn is largely proportional to the number 
of correlators used for the fit, as might be expected from a naive runtime 
analysis. 

5 Alternatives and Variations 

The specific implementation described in the previous section is just one of 
many possible ways to approach the problem of extracting excited state ener- 
gies using evolutionary algorithms. In this section we wish to point out some 
alternatives to our current implementation, many of which we are actively 
exploring at the moment. 

The previously described algorithm used a genetic encoding based on real 
numbers. This offers the advantage of being a straightforward representation, 
as well as the ability to interject local optimization steps in a natural manner. 
An alternative would be to use an integer-based representation of real param- 
eters instead; the advantage in this case would be the ability to use bit-based 
mutation and crossover operations instead of our Gaussian mutations and in- 
terpolating crossover. The bit-based operations, besides likely being faster in 
most cases, are better understood in terms of rigorous theorems regarding 
global convergence properties (such as from schema theory [2SI2H]), but do 
not offer the possibility for easy mixing with local optimization. 

Using straight elitism as the selection method has been found to be favorable 
in the case of problems with a real- number genetic code [18], but introduces 
a certain risk of premature convergence if the elite should happen to cluster 
around a local optimum. This risk is significantly reduced by the addition of 
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random survivors and our use of an island-based ecosystem, but other, more 
sophisticated, selection methods could help to further eliminate any remaining 
bias. Another disadvantage of elitist selection is that it requires a full sort of 
the organisms by fitness in each generation, which makes up a fair part of 
our implementation's computational cost. Other selection methods manage 
to avoid this requirement, which could lead to further gains in speed. The 
specific mutation schedule implemented in our algorithm is also to be seen 
as just one example out of many that are possible. In some cases, it may be 
more favorable to mutate each parameter separately instead of mutating all 
parameters at once. 

Our evolutionary algorithm depends on a number of parameters, such as the 
rate for different kinds of mutations, their dependence on x^/ndof-, and the 
size of the breeding pool. In the current implementation, these parameters 
have been set to reasonable values by hand. For a more highly optimized 
implementation, these parameters should be tuned to values that tend to 
give the fastest rate of convergence towards the true optimum; in principle, 
such tuning itself could be done by means of another evolutionary algorithm, 
although that approach might prove to be fairly expensive computationally. 

Evolutionary multi-modal algorithms (see [IH] and refs. therein) are able to 
find not only absolute extrema but relative extrema as well. "Niching" and 
"diversity preservation" algorithms have been devised that dissuade too many 
elements of the population from going after the single best solution, thereby 
finding not only the best solution but other good solutions. This could be 
useful in the context of fitting since these algorithms can produce the best fit 
as well as other fits that lie in relative extrema that are perhaps comparable 
with the best fit. Comparison of such fits would give the researcher a better 
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feel for the uniqueness and likelihood of the functional form of the solution 
found. 

Several improvements might be made for the simultaneous fitting of multiple 
correlators. Obviously it is of value to cull from the fit any datasets which 
clearly only contain noise. As well, if fitting a large number of datasets it 
could be of value to partition the datasets and find viable fits on each subset 
first. One could then merge these populations into genotypes suitable for the 
entire dataset by stitching together fits from each subset. This could be done 
crudely by just putting the masses back to back, or one could implement 
some algorithm which tried to find common masses at this point between two 
genotypes being merged in some systematic fashion. Running the algorithm 
on these new datasets would then optimize these fits globally, presumably by 
coalescing common masses across the subsets that are statistically equal to 
improve the final fit. 

The local optimization steps used as mutations in our algorithm could be ren- 
dered more efficient by employing techniques that exploit the partial linearity 
of the functional form of equation ([T]) by separating the linear and non-linear 
variables [50] . 

There is also the possibility to combine the variational method with an evo- 
lutionary fitting algorithm. To do this, one could diagonalize the correlator 
matrix as usual with the variational method, and then use the evolutionary 
algorithm to fit the resulting diagonal correlators. In this way, any remain- 
ing mixing between the optimized operators could be detected and quantified, 
while at the same time reducing the number of correlators to be fit. 

An alternative to evolutionary algorithms, which we have not investigated so 
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far, might be the use of Markov Chain Monte Carlo optimization methods 
such as simulated annealing [51j, which share evolutionary algorithms' ability 
to accommodate discrete changes in functional form. 

6 Conclusions 

Evolutionary fitting methods provide an interesting and useful addition to 
the lattice field theorist's data analysis toolkit. Especially when combined 
with other well-known and well-tested fitting methods, evolutionary fitting 
can help to extract information from simulation data without having to im- 
pose any external constraints, such as Bayesian priors. Evolutionary methods 
allow one instead to extract all of this information from the data themselves 
by harnessing the globally optimizing nature of evolving systems. This is par- 
ticularly true in the case of discrete parameters such as the number of states 
to fit, which are hard to determine using more conventional methods. 

We have demonstrated a working method for the extraction of excited state 
masses from lattice QCD correlators using evolutionary fitting. We believe that 
evolutionary fitting algorithms have significant potential as a data analysis 
method in lattice QCD, and that further investigation in this direction is 
warranted. 
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